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DETAILED ACTION 



1 . This action is responsive to communications: Application, filed on 4/12/2004. 
This action is non-final. 

2. Claims 1-25 are pending in this application. Claims 1, 13, 20 and 24 are 
independent claims. 

3. The present title of the invention is "System and method for synchronizing 
samples in a programmable graphics processing unit". 

Claim Rejections - 35 USC § 102 

4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an- application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

5. Claim 1 is rejected under 35 U.S.C. 102(e) as being anticiapted by Puzak et al. 
(US 6,560,693). 

As per claim 1, Puzak et al., hereinafter Puzak, discloses a method for 
synchronizing divergent graphics samples in a programmable graphics processing unit, 
the method comprising: 
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determining that a divergence has occurred (Figure 10, after Start, Branch 
Address Being Decode); 

detecting that a first sample of a group of samples has encountered a first synch 
token ("Each time a branch is decoded, the oldest entry of the PBPQ (702) is checked", 
column 14, line 1 1-12, where the branch address entry is the synch token); 

determining whether any of the other samples of the group has encountered a 
synch token (PBPQ stores all the branch address); and 

determining whether the synch token encountered by any of the other samples of 
the group is the first synch token ("If the entry is valid, in step 1002, the address of the 
branch being decoded is compared to the branch address field of the oldest entry of the 
PBPQ", column 14, line 13-15). 

6. Claim 1 is rejected under 35 U.S.C. 102(b) as being anticipated by Kishi et al. 
(US 6,502,165). 

As per claim 1, Kishi et al., hereinafter Kishi, discloses a method for 
synchronizing divergent graphics samples in a programmable graphics processing unit, 
the method comprising: 

determining that a divergence has occurred ("the library controller 30 of each 
library determines its current idle time status 200", column 1 1 , line 29-30); 

detecting that a first sample of a group of samples has encountered a first synch 
token ("The library controller 30 of each data storage library 14-15 provides an 
updatable synchronization token directly associated with each data volume", column 8, 
line 3-line 6); 
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determining whether any of the other samples of the group has encountered a 
synch token ("The library controller 30 of each data storage library 14-15 provides an 
updatable synchronization token directly associated with each data volume", column 8, 
line 3-line 6); and 

determining whether the synch token encountered by any of the other samples of 
the group is the first synch token ("The director 71-74, upon the determination indicating 
that at least two of the copies of the data volume are at the same fastest available 
access level, compares the provided idle time status of the data storage libraries storing 
those copies, and indicates which library provides the greater idle time status", column 
11, line 46-51). 

7. Claims 1-2, 6-8 and 24-25 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Gupta et al. (US 5,787,272). 

As per claim 1, Gupta et al., hereinafter Gupta, discloses a method for synchronizing 
divergent graphics samples in a programmable graphics processing unit, the method 
comprising: 

determining that a divergence has occurred ("Box 104 identifies shaded and 
unshaded regions ... when a processor reaches a shaed region it will want to 
synchronize", column 3, line 13-16); 

detecting that a first sample of a group of samples has encountered a first synch 
token ("WANTJN is an n-1 bit input for receiving "WANT" bits from the other 
processors. The WANT bits will be on when the corresponding processors want to 
synchronize", column 7, line 18-20); 
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determining whether any of the other samples of the group has encountered a 
synch token ("WANTJN is an n-1 bit input for receiving "WANT" bits from the other 
processors. The WANT bits will be on when the corresponding processors want to 
synchronize", column 7, line 18-20); and 

determining whether the synch token encountered by any of the other samples of 
the group is the first synch token ("The output of match circuit 304 is called "MATCH" 
and is on only when all of the relevant other processors want to synchronize", column 7, 
line 23-25). . 

8. As per claim 2, Gupta demonstrated all the elements as disclosed in the rejected 
claim i, and further discloses whether to initiate a time out ("STALL is turned on to stop 
the processor from executing instructions. WANT_Out is turned on when the respective 
processor wants to synchronize", column 7, line 30-32). 

9. As per claim 6, Gupta demonstrated all the elements as disclosed in the rejected 
claim 1, and further discloses the step of initiating termination steps if the synch token 
encountered by any of the other samples in the group is not the first synch token ("The 
output of match circuit 304 is called "MATCH" and is on only when all of the relevant 
other processors want to synchronize", column 7, line 23-25, where idling is considered 
termination step). 

10. As per claim 7, Gupta demonstrated all the elements as disclosed in the rejected 
claim 1, and further discloses the step of processing the group of samples in non- 
divergent mode if the synch token encountered by each of the other samples in the 
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group is the first synch token (""MATCH" and is on only when all of the relevant other 
processors want to synchronize", column 7, line 24-25). 

11. As per claim 8, Gupta demonstrated all the elements as disclosed in the rejected 
claim 1 , and further discloses the step of holding the first sample idle once the first 
sample has encountered the first synch token ("STALL is turned on to stop the 
processor from executing instructions", column 7, line 30-31). 

12. As per claim 24, Gupta discloses a system for synchronizing divergent graphics 
samples in a programmable graphics processing unit, the system comprising means 
similar to claims 1 , therefore is similarly rejected as claim 1 . 

13. As per claim 25, Gupta demonstrated all the elements as disclosed in the 
rejected claim 24, and further discloses means similar to claim 7, therefore is similarly 
rejected as claim 7. 

Claim Rejections - 35 USC § 103 

14. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

15. Claims 2-5 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Puzak et al. as applied to claim 1 above, and further in view of Doherty et al. (US 
6,115,083). 
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As per claim 2, Puzak demonstrated all the elements as disclosed in the rejected 
claim 1. 

Puzak discloses a method for synchronizing divergent samples. It is noted that 
Puzak does not explicitly disclose determining whether to initiate a time out, however, 
this is known in the art as taught by Doherty et al., hereinafter Doherty. Doherty 
discloses a sequence controller in which time out is used to synchronized processes 
(column 8, line 18-19). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Doherty into Puzak because Puzak discloses a method to synchronize 
divergent samples and Doherty discloses in a sequence controller using time out in 
order to synchronize difference processor. 

16. As per claim 3, Puzak and Doherty demonstrated all the elements as disclosed in 
the rejected claim 2, and Doherty further discloses a time out is initiated if a specified 
amount of time has elapsed and each of the other samples in the group has not yet 
encountered a synch token ("If one processor times out with a pending sync instruction, 
the other can continue executing its program until it encounters the same sync 
instruction", column 8, line 18-20). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Doherty into Puzak because Puzak discloses a method to synchronize 
divergent samples and Doherty discloses in a sequence controller using time out in 
order to synchronize difference processor. 
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17. As per claim 4, Puzak and Doherty demonstrated all the elements as disclosed in 
the rejected claim 2, and Doherty further discloses the step of continuing to wait for 
each of the other samples in the group to encounter a synch token if a time out is not 
initiated ("After a start vector fetch, the processors 42 and 43 are held until both have 
instructions pending and are then started on the same clock", column 8, line 11-13). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Doherty into Puzak because Puzak discloses a method to synchronize 
divergent samples and Doherty discloses in a sequence controller using time out in 
order to synchronize difference processor. 

18. As per claim 5, Puzak and Doherty demonstrated all the elements as disclosed in 
the rejected claim 2, and Doherty further discloses the step of initiating termination 
steps if a time out is initiated ("If one processor times out with a pending sync 
instruction, the other can continue executing its program until it encounters the same 
sync instruction", column 8, line 18-20, where the termination step is the same sync 
instruction). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Doherty into Puzak because Puzak discloses a method to synchronize 
divergent samples and Doherty discloses in a sequence controller using time out in 
order to synchronize difference processor. 

19. Claims 9 and 10 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Gupta et al. as applied to claim 1 above, and further in view of Yamasaki (US 
6,182,211). 
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As per claim 9, Gupta demonstrated all the elements as disclosed in the rejected 
claim 1. 

Gupta discloses a method for synchronizing a parallel processing system. It is 
note that Gupta does not explicitly disclose determining that a divergence has occurred 
comprises determining that a first program counter of a plurality of program counters is 
different than a second program counter of the plurality of program counters, each 
program counter of the plurality of program counters corresponding to a different one of 
the samples of the group of samples, however, this is known in the art as taught by 
Yamasaki. Yamasaki discloses a pipelined microprocessor in which "A second program 
counter 104 serves as second address holding means which saves an addresses of a 
subsequent instruction which is subsequent to the conditional branch instruction, which 
is one of pipeline information of the subsequent instruction which is subsequent to the 
conditional branch instruction, before a condition of the conditional branch instruction 
becomes defined. The address adder output 114 and a second program counter 
branch signal 1 12 are supplied to the second program counter 104, and the second 
program counter 104 outputs a second program counter output 115" (column 4, line 26- 
37, wherein the second program counter is different from the first program counter). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Yamasaki into Gupta because Gupta discloses a method of 
synchronizing a parallel processing and Yamasaki discloses a divergence could be 
cause by difference program counter content in order to jump to a different address. 
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20. As per claim 1 0, Gupta and Yamasaki demonstrated all the elements as 
disclosed in the rejected claim 9, and Yamasaki further discloses the second program 
counter results from a conditional branch or a jump. 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Yamasaki into Gupta because Gupta discloses a method of 
synchronizing a parallel processing and Yamasaki discloses a divergence could be 
cause by difference program counter content in order to jump to a different address. 

21 . Claims 1 1 and 12 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Gupta et al. (US 5,787,272) as applied to claim 1 above, and further in view of 
Nguyen (US 7,013,382). 

As per claim 1 1 , Gupta demonstrated all the elements as disclosed in the 
rejected claim 1 . 

Gupta discloses a method for synchronizing a parallel processing system. It is 
note that Gupta does not explicitly disclose determining that a divergence has occurred 
comprises determining that a first subroutine depth of a plurality of subroutine depths is 
different than a second subroutine depth of the plurality of subroutine depths, each 
subroutine depth of the plurality of subroutine depths corresponding to a different one of 
the samples of the group of samples. However, this is known in the art as taught by 
Nguyen. Nguyen discloses a pipeline mechanism in which "Subroutines are invoked by 
a process termed "calling" ... a main routine could call a first subroutine, which itself 
could call a second subroutine, and so on. This hierarchy of multiple subroutine levels is 
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called "nested" subroutines" (column 2, line 31-39, where the nested subroutines are 
considered of different depths). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Nguyen into Gupta because Gupta discloses a method for synchronizing 
a parallel processing system and Nguyen discloses different subroutines could be 
organized into different depths in order to avoid undue latency. 

22. As per claim 12, Gupta and Nguyen demonstrated all the elements as disclosed 
in the rejected claim 1 1 , and Nguyen further discloses the first subroutine depth being 
different than the8 second subroutine depth results from a call-return ("Subroutines are 
invoked by a process termed "calling", column 2, line 31). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Nguyen into Gupta because Gupta discloses a method for synchronizing 
a parallel processing system and Nguyen discloses different subroutines could be 
organized into different depths in order to avoid undue latency. 

23. Claims 13-15, 17 and 18 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Lindholm et al. (US 7,01 5,91 3) 

The applied reference has a common assignee with the instant application. 
Based upon the earlier effective U.S. filing date of the reference, it constitutes prior art 
under 35 U.S.C. 102(e). This rejection under 35 U.S.C. 102(e) might be overcome 
either by a showing under 37 CFR 1.132 that any invention disclosed but not claimed in 
the reference was derived from the inventor of this application and is thus not the 
invention "by another," or by an appropriate showing under 37 CFR 1.131. 
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24. As per claim 13, Lindholm et al., hereinafter Lindholm discloses a method for 
processing divergent graphics samples in a programmable graphics processing unit, the 
method comprising: 

processing samples of a group of samples in non-divergent mode ("FIG 6 ... 
Instruction Scheduler 430 to schedule the execution of program instructions to process 
several samples", column 13, line 60-42); 

determining whether each program counter of a plurality of program counters is the 
same, each program counter of the plurality of program counters corresponding to a 
different one of the samples of the group of samples ("In one embodiment, instructions 
with equal program counter are considered synchronized", column 14, line 2-3); and 

determining whether each subroutine depth of a plurality of subroutine depths is the 
same, each subroutine depth of the plurality of subroutine depths corresponding to a 
different one of the samples of the group of samples ("In another embodiment, in 
addition to program counters, thread state data such as stack depths, nesting levels, 
subroutine calls, or the like are used to determine two or more threads are 
synchronized", column 14, line 3-7). 

25. As per claim 14, Lindholm demonstrated all the elements as disclosed in the 
rejected claim 13, and further discloses the step of processing one or more divergent 
samples through a remainder of a program if a first program counter of the plurality of 
program counters is different than a second program counter of the plurality of program 
counters ("In step 545 Execution Unit 470 also updates the program counter associated 
with the thread when a branch or loop instruction is executed and the program counter 
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is different than the program counter updated in step 540. In step 547 Execution Unit 
470 determines there are no more instructions in the thread, and, if so, return to step 
535", column 13, line 6-11). 

26. As per claim 15, Lindholm demonstrated all the elements as disclosed in the 
rejected claim 14, and further discloses the first program counter being different than 
the second program counter results from a conditional branch or a jump (column 13, 
line 7-8). 

27. As per claim 17, Lindholm demonstrated all the elements as disclosed in the 
rejected claim 13, and further discloses the step of processing one or more divergent 
sample through a remainder of a program if a first subroutine depth of the plurality of 
subroutine depths is different than a second subroutine depth of the plurality of 
subroutine depths ("in addition to program counters, thread state data such as stack 
depths, nesting levels, subroutine calls, or the like are used to determine thread age", 
column 9, line 11-14, where stack depth represents subroutine depth). 

28. As per claim 18, Lindholm demonstrated all the elements as disclosed in the 
rejected claim 17, and further discloses the first subroutine depth being different than 
the second subroutine depth relates to a call-return ("in addition to program counters, 
thread state data such as stack depths, nesting levels, subroutine calls, or the like are 
used to determine thread age", column 9, line 11-14, where the subroutine call 
represents a call-return). 

29. Claim 20 is rejected under 35 U.S.C. 102(b) as being anticipated by Rishi et al. 
(US 5,953,530). 
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As per claim 20, Rishi et al., hereinafter Rishi, discloses a system for 
synchronizing divergent graphics samples in a programmable graphics processing unit, 
the system comprising: 

a plurality of processing threads, each processing thread corresponding to a 
different sample of a group of samples and configured to contain a program counter, a 
subroutine depth and state data ("FIG. 4 depicts a representation multi-processor 
machine configuration which would be typical for use with a multi-threaded target 
program", column 10, line 49-51); and 

a plurality of stacks, each stack corresponding to a different sample of the group of 
samples and configured to store state data in one or more stack levels ("A thread has a 
program counter (PC) and a stack to keep track of local variables and return 
addresses", column 1 , line 45-47). 

30. Claims 21-23 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Rishi et al. as applied to claim 20 above, and further in view of Cosgrove et al. 
(4,399,507). 

As per claim 21, Rishi demonstrated all the elements as disclosed in the rejected 
claim 20. 

Rishi discloses a method of synchronizing divergent graphics samples. It is noted 
Rishi does not explicitly disclose wherein the subroutine depth of a first sample is equal 
to the number of the one or more stack levels of a first stack that contain state data, the 
first stack corresponding to the first sample, however, this is known in the art as taught 
by Cosgrove et al., hereinafter Cosgrove. Cosgrove discloses an instruction-pipelined 
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processor in which "64 level stack 10 which is addressed with a 6-bit stack Pointer (SP) 
28 allows nesting up to 64 levels of Subroutine and Interrupt routines", column 11, line 
59-61). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Cosgrove into Rishi because Rishi discloses a method synchronizing 
divergent graphics samples and Cosgrove discloses the subroutine instruction can be 
tracked with leveled stack in order to track the routines. 

31 . As per claim 22, Rishi demonstrated all the elements as disclosed in the rejected 
claim 1. 

Rishi discloses a method of synchronizing divergent graphics samples. It is noted 
Rishi does not explicitly disclose wherein each stack resides in a dedicated local 
storage resource, however, this is known in the art as taught by Cosgrove. Cosgrove 
discloses an instruction-pipelined processor in which stack is stored locally (Figure 5, 
item 10). 

Thus, it would have been obvious to one of ordinary skill in the art to incorporate 
the teaching of Cosgrove into Rishi because Rishi discloses a method synchronizing 
divergent graphics samples and Cosgrove a locally stored stack could be used to rack 
subroutines in order to conveniently tracking the routine. 

Allowable Subject Matter 

32. Claims 16, 19 and 23 are objected to as being dependent upon a rejected base 
claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 
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Conclusion 



33. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 



34. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Ryan R Yang whose telephone number is (571) 272- 
7666. The examiner can normally be reached on M-F 8:30AM-5:00PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Michael Razavi can be reached on (571) 272-7664. The fax phone number 
for the organization where this application or proceeding is assigned is (571) 273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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